Variational Planning for Graph-based MDPs

نویسندگان

Qiang Cheng

Qiang Liu

Feng Chen

Alexander T. Ihler

چکیده

Markov Decision Processes (MDPs) are extremely useful for modeling and solving sequential decision making problems. Graph-based MDPs provide a compact representation for MDPs with large numbers of random variables. However, the complexity of exactly solving a graph-based MDP usually grows exponentially in the number of variables, which limits their application. We present a new variational framework to describe and solve the planning problem of MDPs, and derive both exact and approximate planning algorithms. In particular, by exploiting the graph structure of graph-based MDPs, we propose a factored variational value iteration algorithm in which the value function is first approximated by the multiplication of local-scope value functions, then solved by minimizing a Kullback-Leibler (KL) divergence. The KL divergence is optimized using the belief propagation algorithm, with complexity exponential in only the cluster size of the graph. Experimental comparison on different models shows that our algorithm outperforms existing approximation algorithms at finding good policies.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerated decomposition techniques for large discounted Markov decision processes

Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...

متن کامل

Graph Convergence for H(.,.)-co-Accretive Mapping with over-Relaxed Proximal Point Method for Solving a Generalized Variational Inclusion Problem

In this paper, we use the concept of graph convergence of H(.,.)-co-accretive mapping introduced by [R. Ahmad, M. Akram, M. Dilshad, Graph convergence for the H(.,.)-co-accretive mapping with an application, Bull. Malays. Math. Sci. Soc., doi: 10.1007/s40840-014-0103-z, 2014$] and define an over-relaxed proximal point method to obtain the solution of a generalized variational inclusion problem ...

متن کامل

Extending Classical Planning Heuristics to Probabilistic Planning with Dead-Ends

Recent domain-determinization techniques have been very successful in many probabilistic planning problems. We claim that traditional heuristic MDP algorithms have been unsuccessful due mostly to the lack of efficient heuristics in structured domains. Previous attempts like mGPT used classical planning heuristics to an all-outcome determinization of MDPs without discount factor ; yet, discounte...

متن کامل

Variable Independence in Markov Decision Problems

In decision-theoretic planning, the problem of planning under uncertainty is formulated as a multidimensional, or factoredMDP. Traditional dynamic programming techniques are ine cient for solving factored MDPs whose state and action spaces are exponential in the number of the state and action variables, correspondingly. We focus on exploiting problems' structure imposed by variable independence...

متن کامل

Structure Learning in Ergodic Factored MDPs without Knowledge of the Transition Function's In-Degree

This paper introduces Learn Structure and Exploit RMax (LSE-RMax), a novel model based structure learning algorithm for ergodic factored-state MDPs. Given a planning horizon that satisfies a condition, LSE-RMax provably guarantees a return very close to the optimal return, with a high certainty, without requiring any prior knowledge of the in-degree of the transition function as input. LSE-RMax...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Variational Planning for Graph-based MDPs

نویسندگان

چکیده

منابع مشابه

Accelerated decomposition techniques for large discounted Markov decision processes

Graph Convergence for H(.,.)-co-Accretive Mapping with over-Relaxed Proximal Point Method for Solving a Generalized Variational Inclusion Problem

Extending Classical Planning Heuristics to Probabilistic Planning with Dead-Ends

Variable Independence in Markov Decision Problems

Structure Learning in Ergodic Factored MDPs without Knowledge of the Transition Function's In-Degree

عنوان ژورنال:

اشتراک گذاری